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Abstract 

Molecular entities work in concert as a system and mediate phenotypic outcomes and disease states. There has been recent 
interest in modelling the associations between molecular entities from their observed expression profiles as networks using 
a battery of algorithms. These networks have proven to be useful abstractions of the underlying pathways and signalling 
mechanisms. Noise is ubiquitous in molecular data and can have a pronounced effect on the inferred network. Noise can be 
an outcome of several factors including: inherent stochastic mechanisms at the molecular level, variation in the abundance 
of molecules, heterogeneity, sensitivity of the biological assay or measurement artefacts prevalent especially in high- 
throughput settings. The present study investigates the impact of discrepancies in noise variance on pair-wise 
dependencies, conditional dependencies and constraint-based Bayesian network structure learning algorithms that 
incorporate conditional independence tests as a part of the learning process. Popular network motifs and fundamental 
connections, namely: (a) common-effect, (b) three-chain, and (c) coherent type-l feed-forward loop (FFL) are investigated. 
The choice of these elementary networks can be attributed to their prevalence across more complex networks. Analytical 
expressions elucidating the impact of discrepancies in noise variance on pairwise dependencies and conditional 
dependencies for special cases of these motifs are presented. Subsequently, the impact of noise on two popular constraint- 
based Bayesian network structure learning algorithms such as Grow-Shrink (GS) and Incremental Association Markov 
Blanket (IAMB) that implicitly incorporate tests for conditional independence is investigated. Finally, the impact of noise on 
networks inferred from publicly available single cell molecular expression profiles is investigated. While discrepancies in 
noise variance are overlooked in routine molecular network inference, the results presented clearly elucidate their non- 
trivial impact on the conclusions that in turn can challenge the biological significance of the findings. The analytical 
treatment and arguments presented are generic and not restricted to molecular data sets. 
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Introduction 

Identifying associations and network structures from observa- 
tional data sets obtained across a given set of entities is a 
challenging problem and of great interest across a spectrum of 
disciplines including molecular biology [1—8]. While the molecular 
entities of interest are represented by the nodes, their associations 
are represented by the edges. Such networks can prove to be 
convenient abstractions of the underlying pathways and signalling 
mechanisms across distinct phenotypes and disease states. [1,2,7]. 
They can reveal interesting characteristics including repetitive 
structures, dominant players, community structures and generative 
mechanism [9-11] that can assist in developing meaningful 
interventions. 

Molecular data obtained from biological systems may or may 
not have explicit temporal information. While the former explicidy 
captures the evolution of the molecular activity as a function of 
time (dynamic), the latter represents a snapshot of the biological 
activity in a given window of time (static). Dynamic data sets are 
rare and challenging to generate since they demand controlling a 
number of factors. Static data sets in conjunction with multiple 
independent realizations are relatively easier to generate. Their 
prevalence may also be attributed to the tradition of genera- 
ting replicate measurements in molecular biology in order to 



demonstrate reproducibility of the findings. Prior studies on static 
data sets used pairwise dependency measures to capture the 
associations between a given set of molecules in the form of 
relevance networks [1]. The underlying hypothesis being that 
correlated genes are likely to be co-regulated or functionally 
related [12]. However, pairwise dependency measures by defini- 
tion are symmetric measures resulting in undirected graphs. It is also 
known that the dependency between a given pair of genes may not 
necessarily be direct and possibly mediated by other gene(s). This 
possibly motivated the choice of conditional dependencies as 
opposed to pairwise dependencies for molecular network infer- 
ence. Subsequendy, probabilistic approaches such as Bayesian 
network structure learning techniques that model the conditional 
dependencies across a larger number of variables in an automated 
manner were proposed to infer molecular networks from static 
data sets [3,6,7]. The resulting networks of constraint-based 
structure learning are typically in the form of directed acyclic graphs 
(DAGs) or partially directed acyclic graphs (PDAGs). While DAGs have 
directed edges, PDAGs have directed as well as undirected edges 
and accommodate the presence of equivalent classes [13,14]. 
Constraint-based structure-learning algorithms by their very 
nature do not accommodate the presence of cycles and feedback 
between the molecules of interest which is an inherent limitation. 
They have nevertheless proven to be useful approximations of 
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pathways and signalling mechanisms [6,7,13]. The DAGs 
(PDAGs) may also reveal possible causal relationships between the 
nodes under certain implicit assumptions [15]. 

Of interest, is to note that these molecular data sets are 
inherently noisy [16,17,18]. Noise and its variation across 
molecular entities may have contributions from several factors 
including stochastic mechanisms coupled to the systems dynamics, 
sensitivity and precision of the measurement device, variations in 
abundance of specific molecules, preferential binding affinities and 
experimental artefacts that are an outcome of the estimation 
process [7,19,20,21]. While identifying the source of noise is a 
challenging problem in its own merit, understanding its impact on 
network inference procedure is especially critical in order to avoid 
identification of spurious associations. In a recent study, we 
elucidated the non-trivial impact of noise and auto-regulatory 
feedback on networks inferred using Granger causality tests. The 
results were established on multivariate time series generated using 
gene network motifs modelled as vector auto-regressive processes 
(VAR) [22], as well as those inferred from cell-cycle microarray 
temporal gene expression profiles [23,24]. The present study 
investigates the impact of noise on pair-wise correlation, partial 
correlation and constraint-based structure learning algorithms by 
considering static data sets generated from linear models of 
popular network motifs and publicly available molecular expression 
data [7] . Network motifs are repetitive atomic structures that have 
been found to be prevalent across more complex networks [9] . In 
the present study, we consider three popular three-node motifs, 
namely: common-effect, three-chain and the coherent type-I feed-forward 
loop (FFL) [9,25,26]. The common-effect motif 'and the three-chain motif 
represent the convergent and serial connection respectively. These 
connections comprise the fundamental connections in Bayesian 
networks [27]. Furthermore, the conditional independence rela- 
tionships represented by these motifs are usually among the first to 
be examined in any constraint-based structure learning algorithm 
justifying their choice. Common-effect motif is also an essential 
ingredient in identifying equivalent classes and PDAGs [13]. The 
coherent type-I FFL has been shown to persist across a number of 
organisms including E. Coli and S. Cerivisiae [25,26]. Of interest, 
is to note that three-chain and common-effect motifs are an 
integral part of a type-I coherent FFL. Analytical expressions for 
large discrepancies in noise variance on pairwise (correlation 
coefficient) and conditional dependencies (partial correlation) are 
investigated. The impact of such discrepancies on constraint- 
based Bayesian network structure learning is also investigated. 
Finally, the presence of significant discrepancies in noise variance 
and its impact on network inference from experimental molecular 
expression profiles [7] is investigated. 



variations in a on pairwise and conditional dependencies is 
expected and not the goal of the present study. Discrepancies in 
the noise variances across the nodes are represented by parameters 

y,>0, i=l,2. 

Case 1: Common-effect network motif. The common- 
effect network motif (^-structure) [13] is a fundamental connection, 
Fig. la, discussed widely within the context of Bayesian network 
structure learning algorithms. For this motif, z is regulated by x 
and y given by the linear model, 
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The partial correlations are given by 



(3) 



For large noise limit at z(y 2 ->co) with finite noise at y(yj «72)> 
the correlation coefficients are given by 



lim p xv = 0 

y -»co 



Methods and Results 

Prior to investigating the impact of noise on the constraint- 
based Bayesian network structure learning algorithms, its impact 
on pairwise and conditional dependencies across the three network 
motifs is investigated. 

2.1 Pairwise and Conditional Dependencies 

Network Motif Parameters. In the following discussion, 
(x t ,y t ,z t ) represent the molecular expression of the three genes 
(x,y,z) respectively in a small time window ( T, T +t). The terms 
(e t ,rj t ,S t ) represent zero-mean, unit-variance uncorrelated noise 
attributed to inherent uncertainties and artifacts prevalent in 
molecular expression studies. Parameter(o: > 0) represents the 
transcriptional coupling strengths between the genes and is 
constrained to be equal across the genes, since the impact of 
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The partial correlations are given by 
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For large noise limit at y(yj — >co) with finite noise at z(y 2 «yi), 
the correlation coefficients are given by 

lim p xy = 0 



lim p x: = 0 



lim p = 1 



(6) 



The partial correlations are given by 



lim p ■■ 



lim p x - y - 



V+yf) 



lim Pvz.x= l 



(7) 



Remark 1 . Correlation coefficient estimates reveal significant pairwise 
dependencies across (x,z) and (y,z) in contrast to (x,y) resulting in the 
undirected graph x — z,y — z. As expected, conditioning the marginally 
independent nodes (x,y) on z renders them dependent (i.e.p xy Z ^Q). 

• (iJLarge noise limit at the common-effect node z(y 2 — >oo,yi «y 2 ): 
Pairwise as well as conditional dependencies vanish (4, 5) challenging any 
reliable conclusion on the network structure in the large noise limit when 
(y 2 — xXi,}'; «y 2 ) preventing any reliable inference of the network. More 
importantly, conditioning on the common-effect node at large noise levels 
did not render x and y dependent as expected (5). 

• (HJLarge noise limit at one of the causes z(y 2 — >oo,y] «y 2 ): Pairwise 
dependencies (x,y) as well as (x,z) disappear (6). Interestingly, 
conditional dependencies p xyz and p xzy are equal in magnitude with 
opposite signs and function qfy 2 (7). Pairwise as well as conditional 
dependencies p vz and p.„ x have maximal values of unity in the large noise 
limit at y. 

Case 2. Three-chain network motif 

Consider the three-chain network motif [9], Fig. lb, where y 
mediates the activity between (x,z) given by the linear model 
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For large noise limit at z(y 2 — >oo) with finite noise at y(yj «y 2 ), 
the correlation coefficients are given by 
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(12) 



(a) Common-Effect 




(b) Three-chain (c) Type I FFL 



Figure 1. Popular three-gene network motifs: common-effect, three-chain and coherent type-l feed-forward loop are shown in (a), (b) and (c) 
respectively. 

doi:10.1371/journal.pone.0080735.g001 
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For large noise limit at y(yj — ►oo) with finite noise at z(y 2 «y[), 
the correlation coefficients are given by 

lim p = 0 



lim p xz =0 



lim p y: = 1 



The partial correlations are given by 
lim p x „ z = 0 



lim p =0 
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The partial correlations are given by 
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For large noise limit at z(y 2 — >co) with finite noise at y(y, «y 2 ), 
the correlation coefficients and partial correlations are given by 



lim p yzjc =l 



(14) 
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Remark 2. Correlation coefficient estimates reveal significant pairwise 
dependencies across (x,yj, (y,z) and (x,z) resulting in the undirected graph 
x—y,y — z,x — z. As expected, conditioning the marginally dependent nodes 
(x,z) on y renders them independent (i.e.p xz} , = 0). This result is immune to 
the choice of the linear model parameters and reflects possible directed acyclic 
graph of the form X-*y-*Z. 

• (i). Large noise limit at the node z (y 2 — > go): Pairwise dependencies (11), 
(Pxy>Pxz>Pyz) are identical to the conditional dependencies in (12), 
(p X y. z ,p xz .y,Py Z . x )- Of interest is to note that pah-wise dependencies p xy 
and conditional dependency p xvz have identical non-zero magnitude. 

• (it). Large noise limit at the node y( 1 /i— >co): Pairwise dependencies 

(13) , (p X y,p xz ,p yz ) are identical to those of conditional dependencies 

(14) , (p X y, z ,p xz ,y,p yz , x ) similar to what was observed for (y 2 — >go). 
However, in contrast to (y 2 — >go), pairwise (p yz ) and conditional 
dependencies (p yz , x ) are identical with a maximum value similar to that 
of the common-effect network motif. Also, pair-wise dependencies 
(p xy ,P xz ,Py Z ) (13) are identical to those obtained for the common-effect 
motif (6) failing to distinguish these two structures. 

Case 3. Coherent Type-I feed-forward loop network motif 
Consider the coherent type-I feed-forward loop [25,26], Fig. lc, 
where the expression of y is regulated by x whereas those of z is 
regulated by x as well as z given by the linear model 



x, 




"0 


0 


0" 




x, 








yt 




a 


0 


0 




yt 


+ 




(15) 


_ Zt _ 




a 


a 


0 













The correlation coefficients are given by 
E(xy) a. 



Pxy 



Pxz : 



Pvz -- 



G X Oy 

. E(xz) 
o x a z 

E(yz) 



a 1 4 

J a 4 + 2o? + a 2 + a 2 y\ + > 



a 4- a + ay , 



,,2 



^ a 2 + yj ^ a 4 4- 2c/. 3 + a 2 + a 2 y 2 + y\ 



(16) 



lim p -- 



lim p xz =0 



lim p vz = 0 



The partial correlations are given by 



lim p -- 



lim p xzy = 0 



lim p yz x = 0 



a 2 + y 2 



(18) 



a 2 +y 2 



(19) 



For large noise limit at y(y, — >co) with finite noise at z(y 2 «y[), 
the correlation coefficients and partial correlations are given by 
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Remark 3. Correlation coefficient estimates reveal significant pairwise 
dependencies across (x.,y), (y,z) and (x,z) indicating a possible undirected 
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Table 1. Pair-wise and conditional dependencies across the three network motifs in the asymptotic noise limits. 





Ply 


Pxz 


Pyz 


P*?z 


P,z.y 




Common-Effect 


0 


0 


1 


—a 


a. 


1 
















Three-Chain 


0 


0 


1 


0 


0 


1 


Type 1 FFL 


0 


0 


1 


—a 


a 


1 












v^+y 2 ) 




y 2 ->oo 


Pxy 


Pa 


Pyz 


Pxy.z 




Pyz.x 


Common-Effect 


0 


0 


0 


0 


0 


0 



Three-Chain 



Type I FFL 



l^+n 



doi:1 0.1 371 /journal.pone.0080735.t001 



graph of the form x—y — z. Unlike the three-chain, conditioning (x,z) on y 
does not render them independent. 

• (i). Large noise limit at z(y 2 -*co): Pairwise dependencies (18) and 
conditional dependencies (19) are identical to those obtained for the three- 
chain motif (11, 12) failing to distinguish these structures for relatively 
large noise variance at z, Table 1. 

• (it). Large noise limit at y(j\— *oo): Pairwise dependencies (20) and 
conditional dependencies (21) are identical to those obtained for the 
common-effect motif (6), (7) failing to distinguish these structures for 
relatively large noise variance at y, Table 1. Also, the pairwise 
dependencies for (yj — >oo)£r identical for the coherent Type I FFL, three- 
chain as well as the common-effect motif. 



2.2 Constraint-based Bayesian Network Structure 
Learning 

Bayesian network structure learning algorithms have been used 
successfully to infer the associations between a large numbers of 
variables. Several such algorithms have been proposed in 
literature, a partial list of contributions include [28,29,30,31,32]. 
In the present discussion, we focus on constraint-based structure 
learning algorithms that infer the network structure using tests for 
conditional independence, namely: the Grow-Shrink (GS) algo- 
rithm [30] and the Incremental Association Markov Blanket 
(IAMB) [31]. 

GS was the first algorithm that learned the Markov blanket of each 
node as an intermediate step to speed up structure learning 
process. The Markov blanket Bl(X) of a node X is denned as the 
set of nodes that makes X independent from all the other nodes in 
the domain. In a Bayesian network, it is formed by the parents of 
X, its children, and the other parents of its children [15]. 
Therefore, the search for the neighbors of each node can be 
restricted to its Markov blanket, which in most cases contains a 
limited number of nodes. GS learns Markov blankets using a 
forward selection (Growing Phase) followed by a backward selection 
(Shrinking Phase). Conditional independence tests are performed in 
order of increasing complexity (i.e. with respect to the number of 
nodes involved in the test) in order to maximize the overall power 
of the structure learning algorithm. Markov blankets are then 
reduced to the corresponding set of neighbors by an additional 
backward selection. Arc directions are established starting from v- 
structures, which can be identified by the interplay of the causes 



conditional on their common effect, and then propagated to 
prevent the formation of further w-structures and enforce acyclicity. 
This is achieved using the heuristics described elsewhere [30,33]. 
IAMB introduces relatively better heuristics to identify Markov 
blankets while improving on GS by using a forward stepwise 
regression. However, IAMB in contrast to GS is designed to 
identify the Markov blanket of each node and not the complete 
network structure. Essentially, it performs the same task as the first 
step of GS but the forward stepwise selection in IAMB reduces the 
number of nodes incorrecdy included in the Markov blankets. In 
the context of Bayesian network structure learning, IAMB is 
extended to a complete learning algorithm by adding steps 2 to 4 
of GS. While both algorithms have been shown to be formally 
correct, IAMB has been recently supported by more extensive 
proofs and simulations [34,35]. Of interest is to note that GS as 
well as IAMB are highly dependent on the ability of the 
conditional independence tests to correctly identify dependence 
relationships. In fact, the proofs of correctness of both structure 
learning algorithms implicidy assume absence of type I or type II 
errors. Such an assumption can especially be violated in the 
presence of noise that may accentuate false-positives as well as 
false-negatives challenging the biological significance of the results. 
This in turn justifies investigating the impact of discrepancies in 
noise variance across the nodes on network inference using GS 
and IAMB. Since the conditional independence tests increase in 
complexity during the structure learning process across GS and 
IAMB [36] the present study is restricted to well-established 
network motifs that are prevalent across more complex structures. 
The concerns presented across these motifs are expected to be 
aggravated across more complex network topologies. 

Common-effect network motif 

For large noise limit at z(y 2 — *oo) with finite noise at y(j\ «y 2 )-' 
For relatively large noise variance at z, the pairwise as well as 
conditional dependencies (4, 5) vanish across GS as well as IAMB 
resulting in an empty network. This happens regardless of the 
values of (p X y.z>Pxz.y>Pyz.x) because both GS and IAMB test for 
significant pairwise dependencies (p X y,p x ?,Pyz) first and conclude 
the Markov blankets of x,y and z to be empty sets. As a 
consequence, none of the nodes have any neighbours resulting in 
an empty graph. 

For large noise limit at y(yj — >oo) with finite noise at z(y 2 «y[): 
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For relatively large noise variance at y, GS was able to retrieve a 
part of the network structure as discussed below. The Markov 
blankets inferred by GS are as follows: 

• For Bl(x) from (6) we have X-Ly, i.e. p xy = 0 and .v_l_z, i.e. p x - = 0 
resulting in BI(x) = 0. 

• For Bl{y), from (6) we have y Lx, i.e. p xy =0 and yz, i.e. p yz = \. 
As a result, z is added to Bl(y). Also from (7), yx given z, i.e. p xy z #0 
since 0£>0. Therefore,xis added to Bl{y) for suitable values of oc 
resulting in Bl(y) = {x,z} characteristic of the motif (1). 

• For B I (z), from (6) we have z_l_x, i.e. p x , = 0 butzy,i.e. p r = \.As 
a result,}' is added to Bl(z). Also from (7) p xz , y ^0, since oc > 0. 
Therefore, a suitable choice of oc results in the Markov blanket 
Bl(z) = {x,y} characteristic of the motif (1). 

For IAMB, the conditional independence tests are performed in 
a different order since the nodes are included in the Markov 
blankets in decreasing order of association. However, the resulting 
Markov blankets5/(x), Bl(y) and Bl(z) are same as those of GS. 
The impact of discrepancies in noise variance across the nodes on 
structure learning is especially elucidated by the asymmetry of the 
Markov blankets Bl(x) and Bl(y) as well as 5/(x)and Bl(z), 
Markov blankets are symmetric by definition, i.e.xeBl(y) then 
yeBl(x) and vice versa. However, for the present case we have 
following asymmetries (xeBl(y) while yfBl(x)) and (xeBl(z) 
while zfBl(x)) violating the definition of Markov blanket. For 
consistency, a symmetry correction [34,35] may be applied either by 
removing x from Bl(y) and£/(z), or adding^ and z to Bl(x). The 
latter correction enables faithful reproduction of the motif while 
the former does not. 

Three-Chain network motif 

For large noise limit at z(y 2 — >| X>) with finite noise at y(yj «y^):z 
For relatively larger noise variance at z, the Markov blankets 
inferred by GS are given as follows: 

• For Bl(x),fiom (11) we know that xy, i.e. p xy ^0 since a>0. For 
suitable choice of oc, we may correctly infer Bl(x) = {y}. Also, from (11, 
12) we have xLz, i.e. p xz = 0 and xLz\y, i.e. p xz y = 0 so z fBl (x) . 
Therefore, the ability to infer Bl (x) depends on oc. 

• For Bl(y), from (11, 12) we have yJ-Z, i.e. p y - = 0 and y.Lz\x, i.e. 
p yzx = 0 resulting in either Bl(y) = {x}orBl(y) = 0 markedly 
different from Bl(y) = {x,z} characteristic of the motif (8). 

• For Bl(z), from (11) we have z_Lx, i.e. p v _ = 0 and yA-Z, i.e. 
p =0resulti?ig in Bl x z) = 0 in contrast to Bl(z) = {}'} characteristic 
of the motif (8). 

As in the case of common-effect network motif, reordering of 
the conditional independence tests in IAMB does not result in 
Markov blankets different from those inferred by GS. Unlike 
common-effect motif, no asymmetry between the Markov blankets 
is observed for the three-chain, since xeBl(y) and yeBl(x) are 
established using the same correlation coefficient p xy . Given these 
set of Markov blankets, identifying the correct network structure is 
impossible. Since for large values of oc, both GS and IAMB learn 
(x—y,z), while for small values of a both GS and IAMB are unable 
to identify any of the arcs present in the true motif structure. The 
presence of at most a single arc x— y makes it impossible to infer 
its direction, since both GS and IAMB use ^-structures to infer 
directions and the learned motif structure contains none. 

For large noise limit at y(y l ->co) with finite noise at z(y 2 «yi).' 



For relatively large noise variance at y, no reliable conclusion of 
the motif is possible across GS as well as IAMB. The Markov 
blankets are as follows: 

• For Bl(x),fiom (13) we have x.Ly, i.e. p xv = 0 and xLz, p xz = 0. 
As a result, Bl(x) = 0 in contrast to Bl(x) = {_y} characteristic of the 
motif (8). 

• For Bl(y), from (13) we have y Lx, i.e. p xv = 0 but yz, i.e. p yz = \. 

Even after updating the Markov blanket to Bl(y) = {z}, the dependence 
between x andy is obscured by noise as p = 0. Therefore, the Markov 
blanket Bl(y) = {z}. 

• For Bl(z), from (13) we have that z_Lx, i.e. p xz = 0 but zy, i.e. 
p y - = 1 . Also, from (14) we have x.Lz\y, i.e. p xzy = 0. This results in 
the Markov blanket Bl(z) = {y} characteristic of the motif (8). 

In this case, no asymmetry is observed despite the effects of 
noise. Nevertheless, neither GS nor IAMB was able to learn the 
motif for relatively large noise variance. 

Coherent Type-I Feed-Forward Loop motif 

For large noise limit at z(y2~ > °o) with finite noise at y(j\ «yf) : 
For relatively large noise variance at z, the Markov blankets 
determined by GS and IAMB are as follows: 

• For Bl (x),from (18), x y i.e. p xy # 0, since a> 0. Also from (18, 19) 
we note that x_l_z, i.e. p xz = 0 and xLz\y, i.e.p xz v = 0. Therefore, z is 
not included in Bl(x). Thus, GS and IAMB return either Bl(x) = 0 or 
Bl(x) = {y} for suitable choice of oc in contrast to Bl(x) = {y,z} 
characteristic of the motif (15). 

• For Bl(y),from (18) xy, i.e.p xy # 0, since a. > 0. Also, from (18, 19) 
we have y^z, i.e. p yz = 0 andyLz\x, i.e. p yz x = Q. Therefore, z is 
not included in Bl(y) . Thus, GS and IAMB return either Bl (_y) =0 or 
Bl(y) = {x} for suitable choice of ocas opposed to Bl(y) = {x,z} 
characteristic of the motif (15). 

• ForBl(z), it is impossible to learn the correct Markov blanket 
Bl(z) = {x,y} sinceZ-Lx,i.e. p v _ = 0 as well as z.Ly,i.e. p y - = 0 
from (18). As a result, Bl(z) = 0. 

In the present case, discrepancy in noise variance does not result 
in asymmetry in the Markov blankets. Thus, symmetry correction 
may not alleviate the impact of noise. Possible motif structures 
corresponding to large discrepancies at z are either an empty 
structure or (x— y,z). This is problematic for two reasons. First, 
only one arc out of three is correctly identified and its direction 
cannot be determined by the learning algorithm. Second, the 
motif structures above are indistinguishable from those obtained 
for the three-chain network motif. 

For large noise limit at y(y\ — >oo) with finite noise at z(y2«yi).' 

For relatively large noise variance y, again neither GS nor 
IAMB was able to infer the motif. The Markov blankets are given 
as follows: 

• For Bl(x), from (20) we have X-Ly, i.e. p xy = 0 and X-Lz, i.e. 
p xz = 0. This results in Markov blanket Bl(x) = 0in contrast to 
Bl(x) = {y,z}For Bl(y), from (20) we have yLx, i.e. p xy = 0. 
However, y is dependent on z, i.e. p yz = 1. Also, from (21) we have 
p XV7 jt 0, since i > 0. This results in Markov blanket Bl(y) = {x,z} 
characteristic of the motif (15) for suitable choice of parameter oc 
characteristic of the motif (15). 

• For Bl(z), from (20) we have zA-X,i.e. p xz = 0. However, z is 
dependent on y, i.e.p yz = l. Also, from (21) p xvz ^0, since a>0. 
These results in turn result in Bl(z) = {x,^} for suitably large values of 
a. 
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Asymmetry between the Markov blankets is observed across 
Bl{x) and Bl(y) as well as between Bl(x) and S/(z). This can 
be attributed to the fact that xeBl(y) while y^Bl(x) and 
xeBl(z) while z/kBl(x) for suitably large values of a. Correcting 
this asymmetry by adding y and z to Bl{x) results in the Markov 
blankets characteristic of the motif. However, establishing their 
directions is not possible since the presence of an arc between x 
and y prevents both GS and IAMB from identifying x— >}><-z. 
As a result, all possible configurations of the arcs' directions are 
probabilistically equivalent resulting in an undirected graph. 
This is phenomenon is known as the shielded collider identification 
problem and affects all constraint-based learning algorithms 
[37]. 

2.3 Simulation Results 

In the following discussion, the three gene network motifs are 
generated using (1, 8, 15) with parameter (a = 0.5) and 
normally distributed noise. Since the objective is to demonstrate 
the impact of noise as opposed to the other parameter,(a = 0.5), 
is held constant across all the simulations. The noise variance at 
the node x is fixed at unit variance whereas those at y(yj > 0) 
and z(y 2 >0) are varied systematically in order to understand 
the impact of discrepancy in noise variance on the conclusions. 
Three distinct cases of noise variances, namely: {j\ = l,y 2 = 1), 
(y[ = 10,y2 = l) an d (y l = \,y 2 = l0) sire considered. The cases 
(yi = 10,72 = 1) and (y\ = l,y 2 = 10) correspond to large noise 
variance limits as discussed under (Cases 1, 2 and 3) whereas 
(yi = l,y 2 = l) corresponds to absence of discrepancies in noise 
variance. The conditional independence tests used in the 
following discussion is exact t-test for Pearson correlation as 
implemented in the R package bnlearn [38]. A description of 
the functions in bnlearn can be found in the accompanying 
manual with applications to molecular expression profiles in 
[39]. 

Results generated using constraint-based structure learning 
algorithms GS and IAMB were quite similar consistent with 
their expected behaviour, Section 2.2. Therefore, we discuss 
only the results from the GS algorithm. The networks were 
learned across 200 independent realizations of the data (sample 
size = 2000) and Friedman's confidence (i/f) [3] was computed for 
each of the edges. Friedman's confidence essentially represents 
the percentage of times an edge shows up across networks learnt 
independently from bootstrapped realizations. In the case of 
observational data sets, confidences are estimated from networks 
learned from nonparametric bootstraps of the given empirical 
sample. In the present study, the underlying model generating 
the networks is known a priori. Therefore, parametric bootstrap 
is used where independent realizations of the data were 
generated from the model in contrast to non-parametric 
bootstrap [40]. Also, in the present study, confidence estimates 
of edges known to be present in the given graph a priori 
essentially represent their statistical power. As a rule of thumb [3], 
edges with confidence at least (i/'>0.8) were deemed significant. 
In a recent study [41], we proposed a noise floor approach in 
order to avoid the ad-hoc choice of and subsequently a 
statistically motivated approach that estimates optimal \\i from 
the cumulative distribution of the confidence values [42]. 
However, in the present study the actual confidence values are 
presented for enhanced clarity. 

Common-effect network motif. The common-effect net- 
work motif, Fig. la, was generated using (1) with (a = 0.5) and 
normally distributed noise {e t ,r}„8 t ). For finite and equal noise 
variance (yj = l,y 2 = 1) at (y,z) the correlation coefficients p xz ,p yz 
were similar and relatively higher than p xy (~0) as expected (2), 



Fig. 2a. In order to investigate the impact of large discrepancies in 
the noise variances, the noise variance across y was increased 
relative to z(y l = 10»y 2 = 1). This resulted in small values of 
Pxz< Pxy relative to p yz , Fig. 2a and resembled (6) as expected. A 
similar analysis with (yj = 1 «y 2 = 10) across y and z resulted in 
small correlation coefficients across the board similar to (4), Fig. 2a. 
Therefore, large discrepancies in noise variances across the nodes 
can have a pronounced effect on the pair-wise dependencies. The 
corresponding partial correlations for the three choices of noise 
variance (yi,y 2 ) are shown in Fig. 2d. For finite equal noise 
variance (yj = 1 ,y 2 = 1 ) at (y,z), the partial correlation p < 0 (3) 
was non-zero in contrast to p xv = 0, rendering the marginally 
independent nodes (x,y) dependent. Increasing the noise variance 
across y relative to z (yj = 10»y 2 = l) resulted in a significant 
increase in p yzx (7) whereas for (yj = 1 «y 2 = 10), all the 
conditional dependencies were rendered negligible (5) preventing 
any reliable conclusion of the network structure, Fig. 2d. For finite 
equal noise variance (yi = l,y 2 = l) at (y,z), GS was able to 
faithfully retrieve the structure of the common-effect motif, Fig. 3a. 
Increasing the noise variance across y(y] = 10»y 2 = 1) relative to 
z, also retrieved the structure faithfully, Fig. 3c. However, 
increasing the noise variance on the common effect node 
z(Y\ = 1 »y 2 = 10) resulted in low confidence values of the edges 
challenging any reliable inference of the network, Fig. 3b. Thus 
the magnitude of the noise variance at the nodes can have a 
pronounced effect on constraint-based structure learning of a 
common-effect network motif. 

Three-chain network motif. The three-chain network 
motif, Fig. lb, was generated using (8) with (a = 0.5) and normally 
distributed noise {e t ,r\ t ,5 t ). For finite and equal noise variance 
(yi = l,y 2 = l) at the nodes (y,z)the correlation coefficients 
Pxy'Pxz>Pyz were significant as expected (9) with p xz representing 
the transitive dependency between x and z, Fig. 2b. In order to 
investigate the impact of large noise variance, the noise variance 
on the mediating node y was increased relative to z 
(Y\ = 10»y 2 = 1). This resulted in small values of p xy ,p xz relative 
to p yz (13) similar to what was observed for the common-effect 
network motif (6) failing to distinguish these structures. On the 
other hand, large noise variance on the terminal node z relative to 
y (y 1 = l«y 2 = 10) resulted in p xy values relatively higher than 
that of p xz and p yz , as expected from Fig. 2b. These results clearly 
demonstrate the non-trivial impact of noise strengths on network 
inference on pairwise dependencies. Partial correlations p xy z and 
p yzx for finite equal noise variance (yj = l,y 2 = 1) were consider- 
ably higher than that of p xz y (0) as expected, since conditioning 
on the mediator y should render marginally dependent nodes (x,z) 
independent. Increasing the noise variance at y relative to z 
(yj = 10»y 2 = 1) and at z relative to y (yj = 1 »y 2 = 10), rendered 
the pairwise and conditional dependencies similar. This is reflected 
by the similar profiles, Figs. 2b and 2e respectively. For finite equal 
noise variance (yi = l,y 2 = l) at (y,z), GS was able to faithfully 
retrieve the underlying undirected graph, Fig. 3d. This is to be 
expected since the Markov equivalent structure of the three-chain 
network motif is the undirected graph (x — y — z). Increasing the 
noise variance across the mediator y relative to z(y l = 10»y 2 = 1) 
resulted in low confidence along (y — z) preventing any reliable 
inference of possible association between these nodes, Fig. 3e. 
Interestingly, increasing the noise variance on the terminal node z 
relative to y(yj = 1 «y 2 = 10) resulted in low confidence along 
(x — y) preventing any reliable inference of possible association 
between these nodes, Fig. 3f. 

Coherent Type-I feed-forward loop network motif. The 
coherent Type-I feed-forward loop network motif, Fig. lc, was 
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generated using (15) with (a = 0.5) and normally distributed noise 
(e/,rj t ,S t ). While one part of the Type-I FFL resembles the 
common-effect motif (x— >z<— y), the other part resembles a three- 
chain (x— »y— »z), Fig. lb. For finite and equal noise variance 
(y 1 = l,y 2 = l) at (y,z) the pairwise (16) and conditional depen- 
dencies (17) were non-zero. Increasing the noise variance across 
yrelativetoz(7[ = 1 0 » y 2 = 1) resulted in pairwise (20) identical to 
those of the common-effect (6) and three-chain motifs ( 1 3) failing 
to distinguish these network structures. This is reflected by 
similar profiles across Figs. 2a, 2b and 2c. On a related note, 
increasing the noise variance across z relative to 
y(y 2 = 10»y 1 = 1) resulted in pairwise (18) and conditional 
dependencies (19) identical to those of the three-chain motif 
(11, 12) failing to distinguish these two distinct network 
structures. Similarities in the pairwise and conditional depen- 
dencies across these motifs are also reflected by similar profiles 
between Figs. 2b and 2c and between Figs. 2e and 2f 
respectively. For finite equal noise variance (yj = 1 ,y 2 = 1 ) at 
(y,z) GS was able to retrieve the undirected edges (x — y — z), 
Fig. 3g. Failure to retrieve the exact structure, Fig. lc, can be 
attributed to the presence of equivalent classes. Increasing the 
noise variance across z relative to y(y x = 1 »y 2 = 10) resulted in 
low confidences along (x — z) and (y — z) relative to (x — y) 
preventing any reliable inference of possible associations along 
(x — z) and (y — z), Fig. 3h. Thus for these choices of noise 
variances it is possible the results of GS for Type 1 FFL 
resembles the structure of the three-chain failing to distinguish 
them. In contrast, increasing the noise variance at y relative to 
z(y l = 10«y 2 = 1) resulted in large edge confidence only along 
y->z and x->z with low edge confidence along (x — y) Fig. 3i 
preventing any reliable inference of the network structure. 



2.4 Application to Molecular Expression Profiles 
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In a recent study [7] , signalling mechanisms between 1 1 
molecules were inferred from single-cell data using flow-cytometry 
in conjunction with Bayesian network structure learning algo- 
rithms. The resulting network was shown to validate existing 
associations as well as discovering novel undocumented associa- 
tions. Of interest, was the sub-network consisting of three 
molecules (PIP2,PIP3,Plcy) weakly connected to the rest of the 
molecules in the network (see Fig. 3 in [7]). The network structure 
inferred from the molecular expression data between these three 
molecules (PIP2,PIP3,Plcy) consisted of the following directed 
edges PIP3-^PIP2, Plcy^PIP3 and Plcy^PIP2. A quick 
inspection would reveal the resemblance of the relationships 
between these three molecules (22) to that of coherent Type-I FFL 
motif (Fig. lc, Case 3) discussed earlier. The expected and the 
inferred relationships along with the influence paths for these three 
molecules can be found in (Table 3, Sachs et al., 2005). While the 
authors acknowledged that the directionality between (Plcy 
—*PIP3, recruitment leading to phosphorylation) inferred from the data 
was opposite to that established in the literature [43] (see 
Supplementary Material, Table I, Sachs et al., 2005), they 
successfully validated (PIP3^>PIP2, precursor-product) and 
(Plcy^>PIP2, direct hydrolysis to IP3) [44,45] (see Supplementary 
Material, Table I, Sachs et al., 2005). While several data sets were 
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Figure 2. The average correlation coefficient and partial correlation estimates across 200 independent realizations of the common- 
effect, three-chain and coherent Type I feed-forward loop network motifs for various choices of noise variances (y, ,y 2 ) are shown in 

(a, d), (b, e) and (c, f) respectively. The x-axis labels correspond to the correlation coefficients (p xy ,p xz ,p yz ) in (a, b, c) and partial correlations 
(p xyz ,p xz , y ,P yz J in (d, e, f) res pect i ve ly . The (circles, squares and triangles) in each of the subplots correspond to noise variances with magnitudes 
(yj = l,y 2 = 1). (yi = 1,72= 10) an d ("/i = 10,y 2 = 1) respectively. The points bounded by the dotted rectangle represent cases that occurred much 
lesser than 80% of the time as significant (z* = 0.001) across 200 independent realizations. 
doi:1 0.1 371 /journal.pone.0080735.g002 
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Figure 3. Bayesian networks inferred using Grow-Shrink algorithm along with Pearson correlation (a* = 0.01) for the three-gene 
network motifs, namely: common-effect (a-c), three-chain (d-f) and coherent type-l feed-forward loop (g-i) for various choices of 
noise variances: (y, = l,y 2 = 1 ),(>'] = 1,72 = 1"), and (y t = 10,y 2 = 1). The confidences of the edges are represented as percentage of the edges 
that persisted across 200 independent realizations. Edges with (i// > 0.80) are shown by solid arrows whereas others (i// < 0.80) are shown by dotted 
arrows. Edges with confidence (</f < 0.05) are deemed noisy and excluded for clarity. 
doi:1 0.1 371 /journal.pone.0080735.g003 



investigated in [7] , we restrict the present study to the unperturbed 
data set comprising the expression of (PIP2,PIP?>,Plcy) across 
853 single cells. Prior to investigating the impact of noise on the 
network inference between the three molecules, we found the 
distribution of the expression levels across the single-cells to be 
positively skewed, indicating large variations in the expression 
estimates across the cells. Interestingly, we also found the variance 
in the expression levels proportional to their average value across 
the molecules (PIP2,PIP3,Plcy) . Box-Cox [46] transforms are 
widely used in literature to minimize the skew in the distribution 
and suppress non-constant variance as a function of magnitude. In 
the present study, we used the log-transform which is the limiting 
case of the classical Box-Cox transform to minimize the skew in 
the distribution of the expression across these three molecules. 
Therefore, the results across the raw as well as the log-transformed 
data are presented. 

Three different networks (Ylk,k = 1,2,3) were investigated. 11] : 
Network inferred from the given data; II2 : Network inferred from 
data generated from the linear model (22) fit to the given data 
without any constraints on the model parameters; II3 : network 
inferred from data generated by the linear model fit (22) to the 
given data with constraint on the noise variance to be equal 
(;.e.y 0 = y 1 =72). The above exercise was repeated for the raw as 
well as the log-transformed protein expression data and the 
corresponding edge confidences were estimated. The approach is 
outlined below. 

• Step 1: Given the expression X nx i of the three molecules 
across n = 853 cells. 



• Step 2: Generate independent realizations X l mxl> ,i = 1 . . .p by 
resampling X nx i(tn<n) with replacement. In the present 
study, we set (m = 800,p = 200). Set each column in X' lnxi to 
zero-mean. 

• Step 3: Set i<-l. 

• Step 4: Infer the network structure from X' my ^ using the GS 
algorithm. Let the resulting network be IT.'] . 

• Step 5: Estimate the parameters (i.e. regression coefficients 
and noise variances (y 0 , 71,72) by fitting the linear model (22) to 
^mx3- Generate Y' mxi , using the estimated model parameters 
and zero-mean i.i.d. noise terms (et,f] t ,5t) sampled from a log- 
normally distributed noise to accommodate for the positive- 
skew in the distribution. Infer the network structure from Y^ x3 
using the GS algorithm. Let the resulting network be Yl' 2 . 

• Step 6: Generate data Z' mxi , using the linear model in Step 5 
with the additional constraint on equal noise variance 
(i.e.y'i =7o; 72 = )'o) i n (22). Infer the network structure from 
Z' mxi using the GS algorithm. Let the resulting network be TT 3 . 

• Step 7: Set 

• Step 8: Repeat Steps 4-7 till i>p. 

• Step 9: Estimate the confidences of the edges for each of the 
networks (11^,^=1,2,3). 

• Step 10: Repeat Steps 1-9 for the log-transformed data with 
normally distributed noise as opposed to log-normally 
distributed noise in Steps 5 and 6. 

Raw Data. The networks (Tlk,k=\,2,3) inferred using the 
raw data for the molecules (PIP2,PIP3,Plcy) are shown in 
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Figure 4. Bayesian networks inferred using Grow-Shrink algorithm from the molecular expression data (PIP2, PIP3, Plcy) with 
sample-size 800 and Pearson correlation (a* = 0.01) are shown in (a-f). Confidences estimated from 200 independent bootstrap realizations 
are shown along the edges. Edges with (1/0O.8O) are shown by solid arrows whereas others (1// <0.80) are shown by dotted arrows. Edges with 
confidence (t^ < 0.05) are deemed noisy and excluded for clarity. The edge confidences of the networks (IIi, 112,113) inferred from the raw data are 
shown in (a), (b) and (c) respectively. Those inferred on the log-transformed data are shown in (d), (e) and (f) respectively. 
doi:10.1371/journal.pone.0080735.g004 



Figs. 4a-4c respectively. Network structures inferred from the raw 
data (IIi, Step 4) and those of the linear model fit (II2, Step 5) 
exhibited considerable similarity as reflected by their edge 
confidences, Figs. 4a and 4b. The confidence was high along 
PIP2~>PIP3 and Plcy->PIP3, and markedly low along 
Plcy-*PIP2, Figs. 4a, 4b. Noise variance estimated from the 
linear model fit (Step 5) of the raw data revealed around a two-fold 

( 7i \ 

difference i.e. — ~ 2.7 + 0.3 . Constraining the noise variance 

V Vl J 
to be equal (i.e.y\ =y2 = 7o) na d a marked effect on the resulting 
network (Step 6) ZZ3, Fig. 4c. The edge confidences were 
considerably high along PIP2— >PIP3 as seen earlier (II 1 and 
II2), Figs. 4a, 4b. However, relatively smaller edge confidence 
along between (P/cy,PZP3)along either directions, Fig. 4c, in 
contrast to Figs. 4a or 4b was also observed. More importantly, 
constraining the noise variance also increased the edge confidences 
between (Plcy,PIP2) along either directions in contrast to those 
shown in Figs. 4a and 4b (i.e. rii,n2). Thus forcing the noise 
variance to be equal had a pronounced effect on the inferred 
network. 

Log-transformed Data. In order to minimize the impact of 
skewness on the conclusions, the entire exercise was repeated on 
the log-transformed data. The resulting networks along with 
confidence of the edges are shown in Figs. 4d-4f. The networks 
(IIjt,fc=l,2) inferred from the log-transformed data (IIi, Step 4) 
and those from data generated on the linear model fit (II2, Step 5) 
along with the edge confidences are shown in Figs. 4d-4e 
respectively. The noise variance estimates from the linear model fit 
to the log-transformed data revealed no marked difference 

^i.e. ^1-1 +0.01^ in contrast to what was observed in the raw 

data. Since there were no marked discrepancies in noise variance, 
forcing the noise variance to be equal (i'.e.yi = Vi = Vo) na d no 
profound effect on the resulting network (II3, Step 6) Fig. 4f as 
expected. This was revealed by the similar edge confidences across 
II2 and /?3 . Furthermore, it is important to note that the networks 
(Ylk,k= 1,2,3) inferred from the log-transformed data unlike those 
from raw data, failed to capture any relationship Plcy and PIP2. 



Discussion 

Real-world entities work in concert as a system and not in 
isolation. Associations between such entities are usually unknown. 
Inferring associations and network structure from data obtained 
across the entities is of great interest across a number of disciplines. 
The recent surge of high-throughput molecular assays in 
conjunction with a battery of algorithms has facilitated validating 
established associations while discovering new ones with the 
potential to assist in novel hypothesis generation. These associa- 
tions and networks have been shown to capture possible causal 
relationships under certain implicit assumptions and proven to be 
useful abstractions of the underlying signaling mechanism. Such 
an understanding can provide system level insights and often 
precedes developing meaningful interventions. Several network 
inference algorithms have been proposed in literature including 
those that depend on pairwise and conditional dependencies. 
However, little attention has been given to the impact of possible 
discrepancies in noise variance across the data obtained across the 
molecular entities. In molecular settings, such discrepancies can be 
attributed to several factors including inherent stochastic mecha- 
nisms, heterogeneity in cell populations, variations in abundance 
of the molecules, variation in binding affinities, sensitivity of the 
measurement device and other experimental artifacts. Under- 
standing the discrepancies in noise variance is critical in order to 
avoid spurious conclusions and an important step prior to 
identifying the source of the noise. 

The present study clearly elucidated the non-trivial impact of 
discrepancies in noise variance on associations and network 
inference algorithms across synthetic as well as experimental data. 
The impact of large discrepancies in noise variance on associations 
and network structure inferred from data generated using linear 
models of popular network motifs and fundamental connections as 
well as those from experimental protein expression profiles were 
investigated. Analytical expressions and simulations were present- 
ed elucidating the non-trivial impact of noise on three popular 
molecular network motifs and fundamental connections (common 
effect, three-chain and coherent Type-I feed-forward loop). It was 
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shown that discrepancies in noise variance can significantly alter 
the results of pairwise dependencies, conditional dependencies as 
well as constraint-based Bayesian network structure learning 
techniques that implicitly rely on tests for conditional indepen- 
dence. As expected, the discrepancies in noise variances was found 
to result in markedly different topologies from those of their noise 
free counterpart challenging reliable inference of the underlying 
network topology. Such discrepancies were also shown to result in 
spurious conclusion of similar structures across markedly distinct 
network topologies. The impact of discrepancies in noise variance 
were also investigated on publicly available single-cell molecular 
expression profiles of a sub-network comprising of three molecules 
(PIP2, PIP3, Plcy) involved in human T-cell signaling. The sub- 
network shared considerable resemblance to the coherent Type-I 
feed-forward loop. The distribution of the raw expression 
estimates across these three molecules was positively skewed 
indicating large variations in the expression estimates across the 
single-cells. Variance about the average expression across the three 
molecules was found to be markedly different and proportional to 
their average values. Several factors can contribute to such 
discrepancies including: abundance of these molecules, antibody 
binding characteristics, uncertainty due to possible overlap in the 
wavelengths corresponding to the colors tagged to the molecules. 
In the present study, a linear model was fit to the molecular 
expression data. Parameter estimates from the linear model 
indicated significant discrepancies in the noise variances across the 
molecules. Adjusting for these discrepancies in the model was 
shown to significantly affect the edge confidences of the resulting 
networks, hence the topology. The results were presented on the 
raw molecular expression data as well as its log-transformed 
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